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Abstract 

Background: Intronic and intergenic long noncoding RNAs (IncRNAs) are emerging gene expression regulators. 
The molecular pathogenesis of renal cell carcinoma (RCC) is still poorly understood, and in particular, limited studies 
are available for intronic IncRNAs expressed in RCC. 

Methods: Microarray experiments were performed with custom-designed arrays enriched with probes for IncRNAs 
mapping to intronic genomic regions. Samples from 18 primary RCC tumors and 1 1 nontumor adjacent matched 
tissues were analyzed. Meta-analyses were performed with microarray expression data from three additional human 
tissues (normal liver, prostate tumor and kidney nontumor samples), and with large-scale public data for epigenetic 
regulatory marks and for evolutionarily conserved sequences. 

Results: A signature of 29 intronic IncRNAs differentially expressed between RCC and nontumor samples was 
obtained (false discovery rate (FDR) <5%). A signature of 26 intronic IncRNAs significantly correlated with the RCC 
five-year patient survival outcome was identified (FDR <5%, p-value <0.01). We identified 4303 intronic antisense 
IncRNAs expressed in RCC, of which 22% were significantly (p <0.05) cis correlated with the expression of the mRNA 
in the same locus across RCC and three other human tissues. Gene Ontology (GO) analysis of those loci pointed to 
'regulation of biological processes' as the main enriched category. A module map analysis of the protein-coding 
genes significantly (p <0.05) trans correlated with the 20% most abundant IncRNAs, identified 51 enriched GO terms 
(p <0.05). We determined that 60% of the expressed IncRNAs are evolutionarily conserved. At the genomic loci 
containing the intronic RCC-expressed IncRNAs, a strong association (p <0.001) was found between their transcription 
start sites and genomic marks such as CpG islands, RNA Pol II binding and histones methylation and acetylation. 

Conclusion: Intronic antisense IncRNAs are widely expressed in RCC tumors. Some of them are significantly altered in 
RCC in comparison with nontumor samples. The majority of these IncRNAs is evolutionarily conserved and possibly 
modulated by epigenetic modifications. Our data suggest that these RCC IncRNAs may contribute to the complex 
network of regulatory RNAs playing a role in renal cell malignant transformation. 
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Background 

Renal cell carcinoma (RCC) is the most common cancer 
in adult kidney corresponding to nearly 3% of all adult 
malignancies worldwide [1], being an important cause of 
cancer morbidity and mortality [1]. Clear cell renal cell 
carcinoma (ccRCC) subtype is the most prevalent [2], 
making it especially important to identify the molecular 
changes associated with malignant transformation and 
with longer survival [3,4]. The malignant transformation 
has been associated to several changes in gene expres- 
sion patterns, which are critical to several steps of tumor 
progression [5]. 

The noncoding RNAs (ncRNAs) exceed the number of 
protein-coding genes several fold [6], and both micro- 
RNAs (21-24 nt) (miRNAs) and long ncRNAs (> 200 nt) 
(IncRNAs) are now emerging as mammalian transcription 
key regulators in response to developmental or environ- 
mental signals [7-9]. The IncRNAs are classified based on 
intersection with protein-coding genes; when they map 
outside a protein-coding locus they are denominated long 
intergenic ncRNAs (lincRNAs) [9]. Otherwise they are 
classified as intronic, and in this case they can be either 
sense or antisense with respect to the direction of tran- 
scription of the host protein-coding gene in the locus [9]. 

Following the first reports of miRNA expression pro- 
files associated with different types of cancer [10,11], 
several independent studies over the past five years iden- 
tified a number of miRNAs differentially expressed in 
RCC that are correlated with malignancy [12-18] and 
with RCC subtypes classification [19,20]. In addition, a 
metastasis signature comprehending four miRNAs was 
recently described for ccRCC [21]. 

It has become evident that not only miRNAs but also 
IncRNAs are important players in cancer [22-27]. Stud- 
ies on IncRNA expression have mainly been focused on 
the lincRNAs [28,29], essentially to simplify their ana- 
lysis by avoiding possible complications arising from 
overlapping protein-coding genes [30]. Thus, recent 
transcriptome sequencing showed that lincRNAs are ab- 
errantly expressed in a variety of human cancers [31]. A 
transcriptome sequencing study over a prostate cancer 
cohort identified the lincRNA PCAT1 as implicated in 
malignancy progression [32], In human lung adenocar- 
cinoma, another lincRNA, MALAT1, has been associated 
with tumor metastasis [33] and is overexpressed in five 
other types of human cancers [34]. In a rare subtype of 
RCC, namely t(6;ll) RCC, it has been described that 
MALAT1 is fused to TFEB gene [35,36]. Recently, it has 
been shown that Xist lincRNA is a potent suppressor of 
hematologic cancer in mice [37]. 

Intronic IncRNAs constitute the major components of 
the mammalian ncRNA transcriptome [38], and the in- 
tronic IncRNAs are possibly related to a fine-tuning 
regulation of gene expression patterns across the entire 



genome [39]. Although thousands of putative intronic 
IncRNAs have been identified [9,38,40,41], it is yet to be 
determined which ones are functional. Also, it is a chal- 
lenge to determine which ones are either independently 
transcribed or are by-products of pre-mRNA processing, 
with the levels of some of their intronic portions being 
independently regulated [38,42]. In fact, the mechanism 
of action of only a few intronic IncRNAs has been char- 
acterized in the context of cancer [42-44]. In addition, 
there is a number of studies reporting the correlation of 
expression patterns of intronic IncRNAs with cancer, 
such as intronic IncRNAs correlated to the degree of 
tumor differentiation in prostate cancer [45], intronic 
IncRNAs differentially expressed in primary and meta- 
static pancreatic cancer [46] and in dasatinib-treated 
chronic myeloid leukemia patients with resistance to 
imatinib [47]. In breast and ovarian cancer, Perez et al. 
[48] identified 15 aberrantly expressed ncRNAs, of 
which at least three are intronic [48]. In renal carcin- 
oma, there are sparse studies regarding long noncoding 
RNAs. Our group previously identified seven intronic 
IncRNAs significantly deregulated in a set of six ccRCC 
tumor samples when compared with adjacent nontumor 
tissues [49]. Using a microarray approach, another study 
revealed tumor-associated lincRNAs when comparing 
gene expression profiles in six pairs of ccRCC and adja- 
cent nontumor tissues [50]. 

In the present work, our study focused on the analysis 
of unspliced intronic IncRNAs, the class of IncRNAs that 
is the least studied one, in an attempt to point to pos- 
sible new key molecules and pathways involved in renal 
carcinogenesis. In order to analyze gene expression pat- 
terns in tissue samples from RCC patients, we used 
herein two different microarray platforms enriched with 
probes for these intronic IncRNAs. We identified intronic 
IncRNAs whose differential expression was significantly 
correlated with RCC malignancy or with patient survival 
outcome. We also identified sets of intronic IncRNAs that 
are co-regulated in cis or in trans with protein-coding 
mRNAs encoding genes associated with transcriptional 
regulation and with kidney functions. Finally, our data 
demonstrate that RCC-expressed IncRNA loci are signifi- 
cantly associated with CpG islands and histone regulatory 
modifications typical of active RNA Pol II-transcribed 
genes, and that the intronic IncRNAs expression pattern 
in RCC is markedly tissue-specific and evolutionarily 
conserved. 

Results 

Expression signature of intronic long noncoding RNAs 
associated to malignancy in clear cell renal cell carcinoma 

Based on our previous work with kidney tumor samples 
that identified a gene expression signature of 64 genes 
associated to ccRCC that included only 7 intronic 
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IncRNAs [49], we looked for additional intronic IncRNAs 
differentially expressed between ccRCC and nontumor tis- 
sues. For this purpose, we analyzed eleven pairs of tumor 
(T) and matched adjacent nontumor renal tissue (N) sam- 
ples from ccRCC patients. Clinical and pathological data 
of each patient are shown in Additional file 1: Table SI. 
Gene expression was measured with a non-strand- 
specific 4 k-element cDNA microarray platform that in- 
terrogates the expression of 722 intronic IncRNAs, 262 
lincRNAs and 2371 protein-coding genes [45,49], now 
employing an improved T7 RNA-polymerase-based 
cRNA linear amplification and labeling protocol, as de- 
scribed under Methods. 

A ccRCC-associated gene expression profile comprised 
of 29 intronic IncRNAs was identified with statistically 
significant differential expression, by comparing the ex- 
pression of tumor and paired nontumor samples from 
eleven patients (FDR <5%, 1.5-fold change) (Figure 1). 
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Figure 1 IncRNA expression signature of malignancy in clear 
cell renal cell carcinoma (ccRCC). Heat map of 40 differentially 
expressed IncRNAs (rows) identified in 1 1 ccRCC patients (columns). 
Patient ID numbers are shown at the bottom, (false-discovery-rate 
<5%; fold-change >1.5). There are 29 intronic IncRNAs (identified by 
their host-gene symbols) and 1 1 lincRNAs. Blue indicates lower ex- 
pression, and red, higher expression in tumor (T) tissues in relation 
to adjacent nontumor (N) tissues. 



To minimize the contribution of each individual patient 
sample to the set of significantly altered genes [51] the 
statistical analysis has included a leave-one-out cross- 
validation procedure; essentially, one sample was removed 
at a time, and every time a new set of significantly altered 
genes was determined using the remaining ten samples, 
ultimately pointing to the most consistently altered gene 
set, which is common to all leave-one-out sets (see Material 
and methods for details). In addition, 9 non-annotated 
lincRNAs and 2 RefSeq lincRNAs were identified as sig- 
nificantly differentially expressed, totalizing 40 altered 
IncRNAs. Among the 40 IncRNAs, 26 were downregulated 
and 14 were upregulated in tumors when compared with 
nontumor tissues (Figure 1). The list of IncRNAs with 
altered expression is shown in Table 1. 

Protein-coding genes differentially expressed in ccRCC 
and meta-analysis of malignancy related genes 

To evaluate our microarray performance, we searched for 
protein-coding genes differentially expressed in our renal 
cancer samples and compared this set of genes with lists 
of protein-coding genes differentially expressed in ccRCC 
from nine published studies [5,49,52-58]. We identified a 
set of 217 protein-coding genes differentially expressed in 
our ccRCC samples relative to nontumor adjacent renal 
tissue (FDR <5%, leave-one-out cross-validation, 1.5-fold 
change) (Additional file 2: Figure SI; Additional file 3: 
Table S2). The meta-analysis is summarized in Table 2 
and is described in detail in Additional file 3: Table S2. A 
total of 170 (78%) protein-coding genes are expressed in 
common between our study and the other nine previous 
studies. Most genes expressed in common (142/170; 83%) 
showed a concordant expression pattern. 

Additionally, we looked at the expression of the 11 
host protein-coding genes (among the 29) for which 
there were probes for the mRNA from the loci related to 
the intronic IncRNAs candidates. Of these, 10 protein- 
coding genes were detected as expressed in RCC (IGFBP7, 
RAB31, PAPPA, ARPC1B, PTEN, HDAC5, NR2C2, 
MAP2K1, PTPN3, DNAJC3). Only two were detected as 
significantly differentially expressed in RCC compared 
with nontumor, namely RAB31 (fold-change =2.2) and 
ARPC1B (fold-change =1.84) (Additional file 3: Table S2). 
The fold-change and the direction of change of the 
protein-coding genes in tumors are in agreement with the 
literature data from the meta-analyses shown below. 

Because of the limited representation of protein-coding 
genes in the 4 k array, we performed a meta-analysis with 
the ccRCC microarray studies in the literature [5,49,52-58], 
looking for protein-coding genes differentially expressed 
in the loci of the 29 intronic IncRNA candidates of our 
study. Of the 29 protein-coding genes, 28 were detected 
as expressed in at least one study included in our 
meta-analysis (Table 1, last column). Among them, we 
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Table 1 List of 40 IncRNAs differentially expressed in RCC from the present work: 29 intronic IncRNAs and 11 MncRNAs 
(FDR <5%; >1. 5-fold change) 



GenBank IncRNA type 

accession 

of probe § 


RefSeq of 
host gene 


Host gene 
symbol 


Genomic coordinates 


FDR (%) 


Fold change 
of IncRNA 


Average fold 
change of 
host gene + 


AW835362 


ntronic 


NM_001553 


IGFBP7 


chr4:57928550-57929060 


0.00 


4.88 


-1.15 (2/4) 


AW881130 


ntronic 


NM_006868 


RAB31 


chr18:971 1672-9712160 


0.00 


3.72 


2.56 (5/5) 


BF881464 


ntronic 


NM_004930 


CAPZB 


chr1:1 9724054-1 9724494 


0.00 


2.03 


1 .34 (3/5)* 


AW846722 


ntronic 


NM_024113 


O 1 Off 49 


chrl 1 :471 69567-471 69799 


0.00 


2.02 


-0.34 (2/4) 


AW815357 


ntronic 


NM_1 53326 


AKR1A1 


chr1:46029945-46030338 


0.00 


1.77 


-2.20 (3/3) 


AW937741 


ntronic 


NM_003137 


SRPK1 


chr6:3581 9568-358201 94 


2.46 


1.7 


-1.37 (3/4) 


BF350736 


ntronic 


NM_025228 


TRAF3IP3 


chr1:209954933-209955401 


1.93 


1.65 


1 .39 (4/4) 


CK327196 


ntronic 


NM_002581 


PAPPA 


chr9:1 191 0491 7-1 191 05402 


1.86 


1.63 


-3.49 (5/5) 


BF743551 


ntronic 


NM_005720 


ARPC1B 


chr7:98991 157-98991 537 


1.37 


1.6 


2.90 (5/5) 


AW748493 


ntronic 


NM_00 1098634 


RBM47 


chr4:40563872-40564416 


3.16 


-1.52 


-1.82 (4/4) 


CK327206 


ntronic 


NM_015995 


KLF13 


chrl 5:31 628837-31 629268 


2.46 


-1.53 


1.19 (1/4) 


CK327137 


ntronic 


NM_1 73355 


UPP2 


chr2:1 58886308-1 58886559 


1.37 


-1.55 


-6.05 (1/2) 


BE1 68993 


ntronic 


NM_005882 


MAEA 


chr4:1 31 8234-1 31 8651 


1.68 


-1.56 


-1.63 (2/4) 


BE181783 


ntronic 


NM_001893 


CSNK1D 


chrl 7:802261 76-80226555 


0.75 


-1.63 


-1.28 (3/3)* 


AW836810 


ntronic 


NM_000314 


PTEN 


chrl 0:896301 75-89630699 


0.00 


-1.63 


1.74 (4/4) 


BG010306 


ntronic 


NM_005474 


HDAC5 


chrl 7:421 75003-421 75469 


0.75 


-1.66 


0.00 (0/3) 


BF327015 


ntronic 


NM_003298 


NR2C2 


chr3:1 5052840-1 5053222 


0.00 


-1.7 


-0.34 (0/5) 


BF882783 


ntronic 


NM_004924 


ACTN4 


chrl 9:39203995-39204367 


0.75 


-1.74 


-0.29 (1/5) 


BF357721 


ntronic 


NM_002755 


MAP2K1 


chrl 5:66764897-66765436 


0.00 


-1.78 


1.52 (3/4) 


BF360792 


ntronic 


NM_020387 


RAB25 


chr1:1560321 14-156032418 


0.75 


-1.79 


-5.17 (4/4) 


CK327077 


ntronic 


NM_001 170704 


MBNL3 


chrX:1 31 621 693-1 31 622042 


0.00 


-1.9 


-0.79 (0/5) 


CK327106 


ntronic 


NM_005781 


TNK2 


chr3:1 95591 793-1 955921 89 


0.00 


-1.95 


-1.16 (4/4)* 


BF768459 


ntronic 


NM_004924 


ACTN4 


chrl 9:39200205-39200785 


0.00 


-2.01 


-0.29 (1/5)* 


BF368747 


ntronic 


NM_006516 


SLC2A1 


chrl .-43409776-4341 0148 


0.00 


-2.01 


4.43 (5/5) 


BE080597 


ntronic 


NM_002829 


PTPN3 


chr9:1 12237298-1 12237614 


0.00 


-2.13 


-2.15 (5/5) 


BF332192 


ntronic 


NM_0 17890 


VPS13B 


chr8:1 0041 9550-1 0041 9768 


0.00 


-2.13 


1 .37 (3/5) 


BE1 68995 


ntronic 


NM_018253 


YY1AP1 


chrl :1 5565631 4-1 55656660 


0.00 


-2.23 


n.d. 


CK327034 


ntronic 


NM_015575 


GIGYF2 


chr2:233592945-233593379 


0.00 


-2.45 


1.16 (2/4) 


BF368584 


ntronic 


NM_006260 


DNAJC3 


chrl 3:96432041 -96432369 


0.00 


-2.81 


1.74 (1/1) 


BF368636 lincRNAI 

RefSeq ncRNA 


NR_028288 


TCL6 


chr14:96131 134-96131552 


0.00 


1.92 


~ 


AW880409 lincRNA2 

RefSeq ncRNA 


NR_003255 and 
NR_001564 


TSIXandXIST 


chrX:73042786-73043127 


1.95 


1.74 




AW880864 lincRNA3 


n.a. 


n.a. 


chr9:18430899-18431447 


0.41 


1.68 




BF987841 lincRNA4 


n.a. 


n.a. 


chr14:53107162-53107542 


1.12 


1.64 




AW880828 lincRNA5 


n.a. 


n.a. 


chr2:26955660-26956225 


0.78 


1.63 




BF333219 lincRNA6 


n.a. 


n.a. 


chrl 2:49324983-49325465 


0.75 


-1.63 




BG009895 lincRNA7 


n.a. 


n.a. 


chr21:1 91 19867-19120311 


0.00 


-1.74 




BE710971 lincRNA8 


n.a. 


n.a. 


chrl 7:1 81 76339-1 81 76682 


0.00 


-1.93 
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Table 1 List of 40 IncRNAs differentially expressed in RCC from the present work: 29 intronic IncRNAs and 11 MncRNAs 
(FDR <5%; >1. 5-fold change) (Continued) 

BE718437 lincRNA9 n.a. n.a. chrl 7:56595754-56596085 0.00 -1.94 

BF333731 lincRNMO n.a. n.a. chrl 7:621 18605-621 19046 0.00 -2.19 

AW996872 MncRNA1 1 n.a. n.a. chrl 5:58887770-58888280 0.00 -2.45 

§ Probes are double-stranded cDNA, with the sequences that are indicated in the EST accession numbers from GenBank. 

t Host protein-coding genes expression from a meta-analysis of ccRCC gene expression studies [54-58] obtained with the Oncomine™ Software tool: average fold 
change (FC) values in tumor relative to nontumor tissues of all studies are shown. The numbers in bold indicate FC values in the range -1.5 > FC > 1.5. The numbers in 
parenthesis represent: number of studies with significant differential expression of that gene/number of studies that have detected the expression of that gene. 
* For this protein-coding gene more than one probe was present in some of the studies, and these additional probes showed an opposite expression pattern in 
the Oncomine Software. We show the most frequent pattern for this gene among all studies. 



identified 13 genes with significantly altered expression in 
tumor compared to nontumor, displaying fold-changes 
greater than> |L5|, of which 7 were altered in the same 
direction of the intronic IncRNA (concordantly changed) 
and 6 were altered in the opposite direction (inversely 
changed) (Table 1, last column). 

Intronic IncRNA expression profile is correlated to the 
patient survival outcome in RCC 

Next, we asked if there was a signature of intronic 
IncRNAs associated to the patient survival outcome in 
ccRCC. We considered the IncRNA expression data of 
the paired and unpaired tumor samples from sixteen 
ccRCC patients who had a cancer-specific death or were 
disease-free within a 5-year follow-up after surgery 
(Additional file 1: Table SI). A supervised statistical ana- 
lysis identified a 26-gene intronic IncRNA expression 
profile (Additional file 4: Figure S2; Additional file 5: 
Table S3) that was significantly correlated to the patient 
survival outcome (SAM statistical test, FDR <5%, com- 
bined with Golubs discrimination score, p <0.01; see 
Material and methods for details). No lincRNAs in the 
array were identified as correlated to survival. Most of 
the altered intronic IncRNAs present in this signature 
(24/26, i.e. 92%) were down regulated in the disease-free 
group. Patient status (PS, Additional file 4: Figure S2B, 
first line) refers to the disease outcome within the 5 -year 
follow-up after surgery, and it should be noted that it 
was the sole criterion used for the supervised statistical 
analysis of correlated IncRNA expression. For comparison, 



an additional eight clinical and pathological parameters re- 
lated to each patient are shown (Additional file 4: Figure S2). 
Interestingly, a set of eight intronic IncRNAs was detected 
in common, both in the patient survival profile and in the 
ccRCC-associated gene expression profile described above 
(Additional file 5: Table S3, last column). Validation of the 
patient survival profile with an independent, larger patient 
cohort is warranted. 

Real-time qPCR validation assay 

To further validate the microarray expression data, we 
performed independent measurements of RNA abun- 
dance in tumor samples using real-time quantitative 
PCR. The limited amount of RNA available from patient 
samples was a challenge, and we selected only eight in- 
tronic IncRNA candidates to perform these assays. In 
addition, due to the lack of available RNA from all pa- 
tients, we could only test a fraction of the cohort. Four 
IncRNAs mapping to intronic regions, namely ncCllorf49, 
ncHDAC5, ncRAB31 and ncSRPKl, showed a significant 
(p <0.05) differential expression between tumor and non- 
tumor paired samples as measured by qPCR (Figure 2A-D); 
transcripts from these four loci showed an expression 
pattern comparable to the observed in the array, thus cor- 
roborating the differential expression observed in the 
microarray analysis (Pearson correlation r = 0.96). Real- 
time qPCR measurements for transcripts from other four 
intronic regions (ncACTN4, ncIGFBP7, ncMAP2Kl, 
ncPTEN) presented high expression variability and could 
not be validated (data not shown). 



Table 2 Summary of meta-analysis of the 217 protein-coding genes ccRCC signature from the present work with nine 
publicly available microarray studies comparing tumor and nontumor tissue samples from ccRCC patients 

Takahashi Skubitz Higgins Lenburg Liou Jones Gumz Beroukhim Brito 

et al. [52] et al. [5] et al. [56]* et al. [54]* et al. [53] et al. [55] et al. [57]* et al. [58] et al. [49] 

# of genes in common 4 2 35 93 4 42 109 1 29 

# of concordant genes § 4 2 29 78 4 38 94 1 29 
% of concordance 100.0 100.0 82.9 83.9 100.0 90.5 86.2 100.0 100.0 

§ Genes differentially expressed in the same direction (up or down) of that from our study. 

# Lists of differentially expressed genes (p <0.05) obtained from Oncomine™ database. The lists of differentially expressed genes from the other studies in this 
Table were obtained directly from the published papers (see References). 
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Figure 2 Relative quantification and transcriptional orientation of intronic IncRNAs differentially expressed in ccRCC. Expression of 
(A) ncC11orf49, (B) ncHDAC5, (C) ncRAB31 and (D) ncSRPKI was evaluated in tumor and adjacent nontumor paired samples from clear cell RCC 
patients by qPCR. Tumor expression relative to paired nontumor in each patient sample is shown. IncRNA expression was normalized by HPRT1 
gene expression. The statistical significance of the differential expression was evaluated by the t-test (p < 0.05). (E) For each gene, strand-specific 
reverse transcription (RT) followed by PCR shows the presence of intronic messages transcribed from the antisense (AS) and/or the sense (S) 
strands, in a pool of 10 ccRCC tissues, in a pool of 10 matched nontumors, or in the 786-0 tumor and the RC-124 nontumor kidney cell lines. A 
control (C) for the absence of RNA self-annealing during reverse transcription and for the absence of genomic contamination was performed with 
an RT reaction step without primer (+ RT, - RT primer), followed by PCR with the pair of primers for the corresponding IncRNA. 



Transcriptional orientation assay 

For the four intronic IncRNAs ncCllorf49, ncHDACS, 
ncRAB31 and ncSRPKI with differential expression in 
ccRCC validated by RT-qPCR assay, transcriptional 
orientation (sense and/or antisense) was measured by 
strand-specific reverse transcription followed by PCR 
(Figure 2E) in the ccRCC and nontumor patient tissues. 
Three loci showed evidence of both sense and antisense 
messages (ncCllorf49, ncHDACS and ncSRPKI), For the 
ncRAB31 locus, only a transcript with the same (sense) 
orientation of the corresponding protein-coding mRNA 
was detected (Figure 2E). The pattern of strand-oriented 
expression detected in human kidney tissues (pool of 
ccRCC or nontumor samples) was reproduced in kidney 
human cell lines originated from tumor (786-0) and 
nontumor (RC-124) (Figure 2E). To ensure that the 
strand-oriented determinations were free from technical 
artifacts, a control for the absence of self-annealing dur- 
ing reverse transcription (RT) and for the absence of 
genomic contamination was performed with the RT re- 
action step without any primer, followed by PCR with 
the pair of primers for the respective IncRNA; no prod- 
ucts were detected in the controls (Figure 2E, control 
lanes). 



ncHDAC5: characterization of the intronic IncRNA 
decreased in RCC 

The intronic IncRNA ncHDACS, which is expressed 
from the locus of the regulatory histone-modifying en- 
zyme HDACS, was chosen for a more detailed char- 
acterization, because we identified it as decreased in the 
malignancy profile and increased in the patients with a 
poor survival outcome. We extended the ncHDACS tran- 
script by 3'- and 5'-RACE-PCR with a fetal kidney RACE 
library, sequenced the products and determined the 
ncHDACS RNA expressed in kidney to have 1695 nt 
(GenBank Accession JX899681). Stability of the ncHDACS 
transcript was examined by the actinomycin-treatment 
assay, revealing a half-life of 42 min in the 786-0 kidney 
tumor cell line (Additional file 6: Table S4A). 

The abundances of the ncHDACS IncRNA and of the 
HDACS protein-coding mRNA were measured in paired 
tumor and nontumor samples from ten ccRCC patients 
and are shown in Figure 3A. It is apparent that for the 
majority of patients (7/9) the expression level of the 
ncHDACS was significantly lower (p <0.05) in tumor 
than in nontumor tissues (fold change relative to nontu- 
mor < 1) (Figure 3 A, light blue). On the other hand, the 
protein-coding gene expression in tumor did not show a 
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Figure 3 Characterization of the intronic IncRNA expressed from the HDAC5 locus. (A) Relative abundances of the nctiDAC5 IncRNA (light 
blue) and of the HDAC5 protein-coding mRNA (dark blue) are shown as fold change in the tumor relative to the matched nontumor sample for 
each often ccRCC patients. Patients are order according to the fold change of the ncHDAC5. (B-F) Regulatory and conserved elements from the 
ENCODE database are shown at the genomic region of the HDAC5 protein-coding gene from intron 3 to intron 1 1. Arrowheads in (B) show the 
opposing directions of transcription of the HDAC5 and the ncHDAC5 RNAs. In (C) the RNA Polymerase II binding sites measured in 14 cell lines, 
and the CTCF transcriptional repressor insulator binding site are shown. In (D) the histone modification marks H3K27ac, H3K4me3, H3K4me1, 
H3K36me3 and H3K27me3 are shown. In (E) the HMM histone state segmentation annotation is shown, comprising a predicted active promoter 
(red), a strong enhancer (orange) and an insulator (blue) region. In (F) the vertebrate conservation and the CpG islands tracks are shown (no 
marks detected in the latter). (G) The most stable conserved secondary structure predicted by the RNAz tool (P = 0.99) for a segment within 
nctiDAC5. The segment spans 1 10 nt along the 1.7 kb-long IncRNA transcribed in the antisense direction in the HDAC5 locus. 
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significant pattern of change relative to nontumor (p =0.18), 
the fold-change varying widely from 03 to 2.3 (Figure 3A, 
dark blue). In fact, these qPCR results confirm the 4 k- 
array expression measurements of HDAC5 mRNA, which 
showed no significant changes in tumor compared with 
nontumor (see above). The expression of HDAC5 mRNA 
was not correlated to the expression of ncHDACS RNA 
(Pearson r =0.41, p =0.23), which indicates that the mRNA 
and the IncRNA are independently transcribed and/or 
independently regulated. 

To further characterize the intronic ncHDACS IncRNA, 
we looked at public genomic databases [59-61] for gen- 
omic marks of expression regulation in the genomic locus 
of HDAC5 from intron 3 to intron 11 (Figure 3B-F). We 
identified RNA Pol II binding exclusively on intron 3, at 
the vicinity of exon 4, in 14 different cell lines (Figure 3C). 
Further downstream on intron 3, we found CTCF tran- 
scriptional repressor insulator binding (Figure 3C); it is 
known that insulators limit the activity of promoters and 
enhancers to certain functional domains. In addition, we 
identified the occurrence of the active enhancer-associated 
histone mark acetylation of histone 3 lysine 27 (H3K27ac) 
and of the promoter-associated histone mark trimethyla- 
tion of histone 3 lysine 4 (H3K4me3) co-localized with the 
RNA Pol II binding site (Figure 3D). The regulatory- 
element-associated monomethylation of histone H3 lysine 
4 (H3K4mel) as well as the active-transcription-associated 
histone mark trimethylation of histone 3 lysine 36 
(H3K36me3) were identified along the genomic region 
encompassing the intronic ncHDACS (Figure 3D). The 
repressive mark trimethylation of histone 3 lysine 27 
(H3K27me3) was detected at low abundance in this locus, 
at a frequency similar to that of the exonic regions of the 
HDAC5 gene, as expected for actively transcribed regions 
(Figure 3D). In fact, the HMM histone state segmentation 
analysis (Figure 3E) predicts an active promoter (red) at 
the left-hand part of the ncHDACS locus, a strong enhan- 
cer region (orange) in the middle, and an insulator region 
(blue) at the right-hand side. Taken together, these 
ENCODE data suggest that the regulatory elements 
present in the locus, along with RNA Pol II can drive the 
transcription of ncHDACS in the antisense direction, hav- 
ing the ncHDACS TSS in the vicinity of the RNA Pol II 
binding site, as indicated in Figure 3B. It is likely that the 
sense transcript detected by strand-specific RT-qPCR in 
this intronic locus, reflects the presence of HDACS pre- 
mRNA that may originate an independently regulated 
intron segment [38]. 

In addition, we determined that the genomic region 
upstream of the ncHDACS putative TSS and within its 
transcription locus is evolutionarily conserved in verte- 
brates (Figure 3F). On the other hand, CpG islands were 
not detected upstream or within the ncHDACS genomic 
region (Figure 3F). There was no evidence that ncHDACS 



is a precursor of small RNAs, because no miRNA or 
snoRNA with sequence identity to the IncRNA were 
found in the public databases [62,63]. 

Finally, the ncHDACS showed five distinct regions 
(ranging from 79 to 114 nt in length) where evolutionar- 
ily conserved secondary structures were predicted by 
RNAz tool (P > 0.5) (Additional file 6: Table S4B); the 
most significant secondary structure (P = 0.99) covering 
110 nt is transcribed in the antisense direction, and its 
predicted folding is shown in Figure 3G. 

Functional associations of intronic antisense IncRNAs and 
protein-coding mRNAs in RCC 

To extend the study of intronic antisense IncRNAs 
expressed in RCC we used a custom-designed 44 k oli- 
goarray platform that allowed the detection of strand- 
specific expression in the intronic loci, by containing 
10,525 single-stranded 60-mer oligonucleotide probes, 
essentially interrogating 15-fold more intronic loci than 
the array that we had used in the previous experiments. 
We focused on the antisense intronic IncRNAs, exclud- 
ing the sense intronic ncRNAs from further analyses, be- 
cause the antisense messages are admittedly transcribed 
independently from the protein-coding genes in the loci. 
The majority of RCC cases interrogated using this 44 k 
oligoarray were the clear cell subtype studied above (14 
cases), and there were also papillary (2 cases) and chro- 
mophobe (1 case) subtypes; these seventeen tumor sam- 
ples were randomly split into four pools, as indicated in 
Additional file 7: Table S5. 

We identified 4303 antisense intronic IncRNAs as 
expressed in RCC from 3102 protein-coding gene loci 
(Additional file 8: Table S6). To verify their predicted 
noncoding status, we used the software Coding Potential 
Calculator (CPC) [64]. The CPC analysis pointed to a 
lack of protein coding potential of nearly all intronic anti- 
sense transcripts tested (4293/4303, 99.8%) (Additional 
file 8: Table S6). This finding indicates that the vast 
majority is indeed noncoding RNAs. To better describe 
our set of IncRNAs, it was cross-referenced with RefSeq 
annotations at the UCSC database (http://genome.ucsc. 
edu/). We found six RNAs (0.14%) already annotated as 
noncoding RNAs (Additional file 8: Table S6), indicating 
that the vast majority of our set are novel unannotated 
intronic antisense ncRNAs. To investigate if these are pos- 
sible precursors of small RNAs, we cross-referenced the 
genomic coordinates of our 4303 antisense IncRNA set to 
snoRNA [62] and microRNA [63] datasets. Because micro- 
RNA precursor lengths are on average > 1,000 nt, we ex- 
tended the IncRNAs genomic coordinates by 1 kb at both 
the 3'- and 5'- ends. Only one ncRNA out of all 4303 
ncRNAs mapped to a small RNA, namely U99 (Additional 
file 8: Table S6), suggesting that this set of antisense 
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IncRNAs expressed in RCC has a diverse function other 
than being precursors of small RNAs. 

Next, we investigated the patterns of co-expression of 
the antisense IncRNAs and the mRNAs expressed in cis 
(both expressed from the same locus) or in trans (ex- 
pression of an antisense IncRNA correlated to the ex- 
pression of mRNAs from other loci). We started with 
the 4303 intronic antisense IncRNAs expressed in renal 
cancer, and analyzed their expression pattern in RCC 
and in three human tissues previously studied by our 
group with the same microarray platform [40], namely 
normal liver, prostate tumor and kidney nontumor sam- 
ples. For the ds-correlation analysis, Spearman correl- 
ation was calculated using the expression levels of each 
antisense IncRNA and the mRNA expressed from the 
same locus measured across RCC and the three tissue 
types. A total of 3467 (out of 4303; 81%) IncRNAs/ 
mRNAs from the same locus were considered in the 
analysis because they were detected in all tissues. We 
identified a direct or inverse ds-correlation for the ex- 
pression in the four tissues of 929 (929/4303 =22%) anti- 
sense IncRNA/mRNA from the same locus (Figure 4A 
and Additional file 8: Table S6). These IncRNAs/mRNAs 
had significant (p <0.05) correlation coefficients in the 
range -0.5 > p >0.5. 

Next, we performed a gene enrichment analysis to iden- 
tify Gene Ontology (GO) terms that were overrepresented 
among protein-coding genes whose expression was signifi- 
cantly ds-correlated to the expression of intronic antisense 
IncRNAs from the same loci We found the term "bio- 
logical regulation" as the most enriched general term (p < 
5.00E-7) followed by "cellular component organization", 
"cellular process", "developmental process", "reproduc- 
tion" and "death" (Figure 4B). It is noteworthy that among 
all GO enrichment terms, the term "regulation" is present 
in 40% (61/152). Among the enriched "biological regulation" 
processes are the "regulation of cell growth", "regulation 
of cell proliferation", "regulation of cell communication", 
the "positive regulation of protein metabolic process" and 
the "negative regulation of transcription from RNA pol II 
promoter" (Additional file 9: Figure S3). 

Considering only the positive ds-correlation for GO 
enrichment analysis, 58 GO terms are enriched, and 98% 
(57/58) of those are present in the complete ds-correlation 
analysis. Regulation of cellular process is the most fre- 
quent GO term. Regulation is present in 34% (20/58) of all 
GO enriched terms. Considering only the negative cis- 
correlation for GO enrichment analysis, 60% (32/53) are 
related to regulation, being regulation of metabolic 
process the main enriched GO term. Of those, 32% (17/53) 
are exclusive GO terms that were not present in the 
complete ds-correlation analysis. All GO-enriched terms 
are shown in detail in Additional file 9: Figure S3 and 
listed in Additional file 10: Table S7. 



We observed with the Spearman analysis described 
above that the expression of the majority of the anti- 
sense IncRNAs (78%) was not ds-correlated to the ex- 
pression of the mRNA transcribed in the same locus 
(Figure 4A). Therefore, to investigate subsets of intronic 
antisense IncRNAs that were £raws-correlated, we per- 
formed a Spearman correlation analysis comparing the 
level of each IncRNA with the expression levels of 
mRNAs from all genomic loci represented in the 44 k- 
array, again using the data from RCC and from the three 
other human tissues [40]. To favor the identification of 
biologically relevant regulatory RNAs, only the 20% 
most abundant intronic antisense IncRNAs in RCC (n = 
860) were used for the £ra#s-correlation analysis. A total 
of 693 antisense IncRNAs (out of 860; 81%) and 5438 
mRNAs that were detected in all tissues were used to 
calculate a matrix of trans correlations. We identified 
inverse or direct high £ra«s-correlation values (-0.7 > 
p >0.7) between all 693 antisense IncRNAs and at least 
one of 5293 mRNAs from different genomic loci (out of 
5438 mRNAs) (Additional file 11: Figure S4), which corre- 
sponds on average to the expression level of one antisense 
IncRNA being trans -correlated to the expression of 7.6 
different expressed mRNAs in the four tissues studied. 

Next, using Genomica software and the matrix of 
trans correlation as input, we constructed a module map 
of antisense IncRNAs versus GO enriched terms among 
the £raws-correlated mRNAs (Figure 4C). We identified 
106 intronic antisense IncRNAs positively and negatively 
associated to 51 enriched GO terms (p < 0.05, Bonferroni 
correction). Among those GOs with correlated IncRNAs 
are "response to stress", "inflammatory response", "meta- 
bolic process", "immune response", "RNA processing", 
"response to stimulus", and "ion transporter activity" 
(Figure 4C; Additional file 10: Table S7D). 

Intronic antisense IncRNAs expressed in RCC are enriched 
in genomic marks that suggest an independent gene 
expression regulation 

To determine if regulatory elements occur at and are fre- 
quent in the genomic regions of the intronic antisense 
IncRNAs expressed in RCC, we compared the overlap dis- 
tribution of genomic coordinates of these IncRNAs with 
datasets of genomic coordinates of Cap Analysis Gene 
Expression (CAGE) tags from PolyA+ RNA-derived librar- 
ies from 35 cell lines [65], of predicted CpGs islands [66], 
HMM active promoter prediction [59] and of ChlP-seq 
data for RNA Polymerase II binding site [65] and histone 
modification marks [59,60,67]. A random sequences set 
was used as control. Because the transcripts we had identi- 
fied as expressed in RCC are mainly polyA+, given that 
our microarray experiments were performed using oligo- 
dT primed cDNA synthesis and labeling, we chose to use 
the PolyA+ RNA-derived ENCODE datasets. 
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Figure 4 Functional associations of intronic antisense IncRNAs expressed in RCC. (A) C/s-correlation analysis. Histogram of Spearman 
correlation values calculated using the expression levels of intronic IncRNAs and mRNAs expressed in 4303 gene loci, across RCC and three other 
human tissues (normal liver, prostate tumor and kidney nontumor). (B) GO enrichment analysis of the mRNAs correlated in cis with the IncRNAs 
from the same loci (Spearman correlation -0.5 > p >0.5; p <0.05; see red broken lines in panel A). Color scale indicates increasingly higher 
statistical significance of enriched GO terms: Yellow, p = 0.05; Dark orange, p <0.0001. (C) Trans-correlation analysis. Module map of IncRNAs and 
GO enriched terms among trans-correlated mRNAs. Analysis was performed with the 20% most abundant IncRNAs (columns) that showed 
Spearman correlation values in the ranges -0.7 > p >0.7 between its expression level in RCC and in three other human tissues (normal liver, 
prostate tumor and kidney nontumor) and the expression of mRNAs outside the host locus (correlation in trans; p <0.05); GO terms significantly 
enriched among trans-correlated mRNAs are shown in the rows (p <0.05 with Bonferroni correction). Colors indicate if the majority of the mRNAs 
within that GO is directly (yellow) or inversely (blue) correlated with the IncRNA. A black entry indicates no significant enrichment. The lists of GO 
enriched terms and of mRNAs belonging to each term for panels 4B and 4C are given in Additional file 10: Table S7. 



A significant association of CAGE tags with the puta- 
tive antisense IncRNA TSSs was identified (Figure 5A). 
This finding is analogous to the reported presence of the 
5' cap modification at the TSS of lincRNAs [29]. CAGE 
tags are mostly present within the first kb from the 
known TSS of antisense IncRNAs and of mRNAs. This 
distribution is statistically different (Kolmogorov-Smirnov 
(KS) test p< 0.001) from that observed for the control 
random sequences set (Figure 5A). Next, we identified a 
significant association (KS test p < 0.001) between the 
predicted TSS of intronic antisense IncRNAs and CpG 
islands (Figure 5B), active promoter HMM predicted 
regions (Figure 5C) or RNA polymerase II binding site 
(Figure 5D). 



We also identified a significant association between 
transcriptional activation histone marks H3K27ac 
(Figure 5E) or H3K4mel (Figure 5F) and the putative TSSs 
of the intronic antisense IncRNAs (KS test p < 0.001); the 
analysis was performed with data from seven different 
human cell lines [59]. We looked at histone modification 
marks in renal tissue datasets [60], and found that the 
promoter-associated H3K4me3 mark (Figure 5G) and 
the activation-associated H3K36me3 mark (Figure 5H) 
showed a statistically significant higher frequency (KS test 
p < 0.001) at the genomic regions of the intronic antisense 
IncRNAs transcribed in RCC. The transcriptional repres- 
sive H3K27me3 mark was not identified in the renal tissue 
public data [60] at the TSSs of the antisense IncRNAs 
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Figure 5 Regulatory genomic marks associated with intronic antisense IncRNAs expressed in RCC. Red lines show the abundance 
distribution of CAGE tags (A), CpG islands (B) and histone marks (C-G) within a distance of 5 kb from the TSSs of the intronic antisense IncRNAs 
expressed in RCC For comparison, abundance distribution of these marks for an equal number of protein-coding mRNAs (black lines), or for a 
control set of randomly selected intronic genomic sequences with the same length of the expressed IncRNAs (grey lines) were calculated. 
(A) CAGE tags, (B) CpG islands, (C) active promoter HMM predictions, (D) RNA polymerase II binding sites, (E) transcriptional activation histone mark 
H3K27ac, (F) transcriptional activation histone mark H3K4me1, (G) promoter-associated histone mark H3K4me3, and (H) activating-associated histone 
mark H3K36me3. In parentheses are the significance p-values of Kolmogorov-Smirnov (KS) statistical tests for differences in abundance distribution in 
relation to the control random set. 
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(data not shown); this was expected because these IncRNAs 
are the ones detected as expressed in RCC 

Intronic antisense IncRNAs expressed in RCC are 
specifically expressed in other tissues 

To investigate the tissue-specificity of the 4303 intronic 
antisense IncRNAs expressed in RCC we cross-referenced 
the genomic coordinates of our dataset with the coordi- 
nates of RNA-seq reads from nine human tissues [68] 
(Figure 6A) and with RNA-seq data of strand-oriented 
libraries from seven human cell lines [69] (Figure 6B). In 
the human tissues analysis, we found that 15% of the anti- 
sense IncRNAs (628 out of 4303) were detected only in 
RCC (Figure 6A). A total of 3675 IncRNAs were detected 
in at least one of the nine tissues (Figure 6A). In the 
strand-oriented data from human cell lines, we found that 
71% of the antisense IncRNAs (3064 out of 4303) were 



detected only in RCC (not shown). A total of 1239 
IncRNAs (out of 4303, i.e. 29%) were detected in at least 
one out of the seven cell lines (Figure 6B). A similar well- 
marked expression pattern was observed for protein- 
coding genes across tissues and cell lines in RCC 
(Additional file 12: Figure S5), with the notable exception 
that the majority of these protein-coding genes (94%, i.e. 
5296/5632) were detected in at least one of the strand- 
oriented RNA-seq data from the human cell lines. 

Intronic antisense IncRNAs expressed in RCC are 
evolutionarily conserved 

Expression conservation was evaluated by comparing the 
intronic antisense IncRNAs detected in RCC with cDNAs 
expressed in 15 vertebrate species that are compiled in the 
TransMap cross-species alignments [70]. This analysis re- 
vealed that 60% of the intronic antisense IncRNAs 
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Figure 6 Tissue expression pattern of intronic antisense IncRNAs. (A) Heat map representing the abundance of 4303 RCC-expressed intronic 
antisense IncRNAs (columns) across other nine human tissues (rows) based on public RNA-seq data [68]. Color intensity represents fractional 
density expression of each IncRNA across all tissues (see Material and methods for details). There are 628 IncRNAs (out of the 4303; i.e. 15%) at 
the right hand side of this panel that were exclusively detected in RCC. (B) Heat map indicating the presence (red) or the absence (white) of 
1239 RCC-expressed IncRNAs (columns) in seven human cell lineages (rows) from public strand-oriented RNA-Seq libraries [69]. These 1239 
intronic antisense IncRNAs represent 29% of the 4303 IncRNAs detected in RCC; the other 3064 IncRNAs (71%) were detected exclusively in RCC, 
not in the cell lines (not shown). Expression data was hierarchically clustered. 
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expressed in RCC (2594 out of 4303) are expressed in at 
least another species (Figure 7A). There is a higher pro- 
portion of expression conservation across the species in 
the IncRNA dataset compared with 10 control random 
sets of sequences extracted from the human genome 
(Fisher test p < 0.0001) (Figure 7B). 

To further explore the conserved pattern of expression 
of these 4303 intronic antisense IncRNAs, we compared 
them with the 4858 introns harboring functional anti- 
sense ncRNAs recently identified by large scale RNA-seq 
in the mouse lung in response to inflammation [38]. A 
total of 1220 intronic regions could unequivocally be 
mapped to human genomic loci, and their corresponding 
coordinates were cross-referenced to the coordinates of 
the 4303 intronic antisense IncRNAs expressed in RCC. 
A total of 53 IncRNAs were detected as expressed in 
common both in mouse and in human, at syntenic loci, 
and the genomic coordinates are given in Additional file 
8: Table S6. The length of overlap was in the range of 30 
to 1228 bases among the 53 IncRNAs (Additional file 8: 
Table S6). We found a significantly higher proportion of 
expression overlap between mouse and RCC (53 out of 
4303 intronic loci expressed in RCC) compared with a 



control random set of IncRNA sequences extracted from 
the subset of IncRNAs with no evidence of expression in 
RCC, among the entire set of intronic antisense 
IncRNAs probed in the 44 k array (overlap of 23 out of 
4303 random intronic loci with no evidence of expres- 
sion in RCC) (Fisher test p <0.001). 

Comparison of the genomic coordinates of the 4303 
intronic antisense IncRNAs expressed in RCC with those 
from conserved DNA elements identified in vertebrates, 
placental mammalians and primates (PhastCons, [61]) 
revealed a significant enrichment as compared with a 
random set of genomic sequences used as a control 
(Fisher test p < 0.0001) (Figure 7C). RNAz analysis [71] 
predicted secondary structure conservation for 131 in- 
tronic antisense IncRNAs (Additional file 8: Table S6 
and secondary structures at http://verjol01.iq.usp.br/ 
sites/projetosLab/fachel/structures/results.html). There are 
73 antisense IncRNAs in common to all three conserva- 
tion analyses described above (Figure 7D). 

Discussion 

In the present study, we determined the expression pat- 
tern of a collection of intronic IncRNAs in clear cell RCC 
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patients and identified candidates that might play a role in 
renal cancer biology. There are only two published studies 
of IncRNAs in RCC so far: our previous study [49] that 
identified for the first time seven intronic IncRNAs differ- 
entially expressed in RCC among a protein-coding gene 
signature; and the work of Yu et al that identified 626 
IncRNAs differentially expressed between tumor and non- 
tumor tissue in 6 clear cell RCC patients. These authors 
used a microarray that essentially probed intergenic 
IncRNAs [50] and they validated by qPCR four transcripts, 
being three intergenic IncRNAs (ENST00000456816, 
X91348 and NR_024418); one was not a IncRNA, but 
rather the non-coding 3 '-end portion of the TMEM72 
protein-coding gene (BC029135). 

We identified 29 IncRNA transcripts originated from 
intronic regions and additionally 11 from intergenic re- 
gions, resulting in a ccRCC-associated gene expression 
profile comprised exclusively of IncRNAs. From this set, 
there are three intronic IncRNAs from the ACTN4, 
HDAC5 and SLC2A1 loci identified as down-regulated 
both here and in our previous study [49] using the same 
microarray platform. This partial overlap (3 out of the 6 
intronic IncRNAs described in Ref. [49]) is possibly re- 
lated to the more stringent statistical criteria presently 
used, namely the leave-one-out approach that minimizes 
the contribution of each individual patient to the set of 
significantly altered genes when a small patient cohort is 
analyzed [51,72]. 

The comparison of our 217 protein-coding gene profile 
with nine published studies of differentially expressed 
protein-coding genes in ccRCC [5,49,52-58] verified that 
the vast majority (83%) of the genes in common (142/170) 
presented a concordant pattern of expression (Table 2), 
thus validating the present analysis as representative of the 
ccRCC biology. 

Besides a set of intronic IncRNAs potentially involved 
in carcinogenesis, the present study identified a set of 26 
intronic IncRNAs that were correlated to the survival of 
ccRCC patients. From this set, eight IncRNAs were iden- 
tified as altered in both the malignancy and the survival 
outcome expression profiles, and they are transcribed 
from the loci: ACTN4, CSNK1D, DNAJC3, GIGYF2, 
HDAC5, PTPN3, RAB25 and VPS13B. To the best of our 
knowledge, this is the first study suggesting IncRNAs as 
correlated to the patient survival outcome in RCC. Re- 
garding other types of ncRNAs, there are at least two 
miRNA expression studies that had identified candidates 
correlated with patient survival outcome in RCC [21,73]. 
The IncRNAs identified in the present work may con- 
tribute to future studies focusing on IncRNAs as mo- 
lecular markers in RCC oncology. 

There are few examples of well-characterized IncRNAs 
associated with RCC. The lincRNA GAS5 is a well de- 
scribed tumor suppressor in breast cancer [74], and very 



recently it was described in prostate cancer cell lines [75] 
and in RCC [76]. A decreased expression of the lincRNA 
GAS5 is associated to RCC genesis and progression, and 
its overexpression is associated to cell proliferation inhib- 
ition and apoptosis induction [76]. Another example in- 
cludes two antisense IncRNAs at the 5' (5'aHIF-la) and 
3' (3'aHIF-la) ends of the human HIF-la gene that are 
expressed in human kidney cancer tissues [77]. 

In cancer, there are a few examples of the mechanisms 
of action of intronic IncRNAs. Our group described the 
intronic antisense and unspliced IncRNA ANRASSF1 
that causes the epigenetic in cis downregulation of the 
tumor suppressor RASSF1A gene and increases cell pro- 
liferation [43], and its expression is higher in prostate 
and breast cancer cell lines compared with nontumor 
cells [43]. Guil et al [42] identified that overexpression 
of the sense intronic IncRNA from the SMYD3 locus 
caused the epigenetic in cis regulation of SMYD3 and a 
decrease in colorectal cancer cell line proliferation [42]. 
The androgen-regulated intronic antisense IncRNA 
CTBP1-AS [44] appears to be a key antisense ncRNA 
that acts as both cis- and trans-regulator of gene expres- 
sion. The CTBP1-AS IncRNA promotes prostate cancer 
growth through sense-antisense repression of the tran- 
scriptional co-regulator CTBP1 transcribed from the 
same locus (ds-regulation), and through a global epigenetic 
regulation of tumor suppressor genes (£ra«s-regulation) 
[44]. In fact, the intronic and also the intergenic IncRNAs 
play important epigenetic roles in cancer [78] . 

We decided to study the intronic IncRNA ncHDACS 
in more detail because it showed a decreased expression 
in ccRCC tumor compared with nontumor tissue that 
was confirmed by qPCR, and because its increased expres- 
sion seems to be associated to the cancer-related death 
after surgery in RCC, as suggested by our patient survival 
outcome analysis. We determined that ncHDACS is an 
unspliced long transcript (1.7 kb long), detected in the 
antisense and sense directions relative to the protein- 
coding gene histone deacetylase 5 (HDAC5). It has a short 
half-life of 42 min compared with other well studied 
IncRNAs, such as Air, Kcnqlotl and Xist, which have half- 
lives of 2.1, 3.4 and 4.6 h, respectively [79], with an evolu- 
tionarily conserved secondary structure. The absence of 
association between the expression of ncHDACS and the 
protein-coding mRNA HDAC5, determined by qPCR and 
by a meta-analysis of five kidney cancer gene expression 
studies (Table 1), suggests a locus independent function, 
with the ncHDACS possibly acting in trans to regulate 
protein-coding genes (see the discussion on trans regula- 
tion below). Unfortunately, a probe for this ncHDACS was 
not present in the 44 k oligoarray that was used for asses- 
sing the trans correlation of expressed IncRNAs/ mRNAs, 
and it was not possible to determine the ncHDACS candi- 
date target mRNAs by our co-expression analysis. 
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An in silico analysis indicated the presence of RNA 
Pol II binding and of the histone marks H3K27ac and 
H3K4me3 at -1.5 kb upstream of the putative TSS of an 
antisense ncHDACS transcript in the HDACS locus. 
Considering the lack of methylation marks in the vicinity 
of the IncRNA, this observation opens an interesting 
possibility of transcriptional regulation of the antisense 
IncRNA ncHDACS by histone acetylation. It is in line 
with the result recently described for the IncRNA-LET, a 
IncRNA generally downregulated in carcinomas, that 
was shown to be repressed by histone deacetylase 3 under 
hypoxic conditions [80]. Interestingly, the transcriptional- 
activation-associated H3K4mel and H3K27ac histone 
modification marks at human enhancers have been de- 
scribed as related to a cell-type specific protein-coding 
gene expression [81]. The TSSs at the IncRNA ncHDACS 
locus as well as at the loci of the other intronic antisense 
IncRNAs expressed in RCC were enriched with both his- 
tone marks, in agreement with the fact that the intronic 
IncRNAs tend to have a tissue-specific pattern of expres- 
sion [9], thus supporting a possible cell-type specific 
modulation of intronic antisense IncRNAs by histone 
methylation and acetylation. 

Because the intronic IncRNAs revealed a promising 
well-defined pattern of altered expression in RCC, and 
there is scarce data about this ncRNA class in RCC, we 
extended our study to the antisense intronic IncRNAs 
using a custom-designed strand-specific 44 k-element 
microarray that contained 15-fold more probes for 
IncRNAs than the 4 k- array that we had previously used. 
With this new platform, we identified 4303 antisense in- 
tronic IncRNAs expressed in RCC; we found that 4061 
out of the 4303 antisense IncRNAs have not been previ- 
ously reported in the Yu et al. study [50] as being 
expressed in RCC, which is in agreement with the fact that 
Yu et al. [50] used a microarray that probed mostly inter- 
genic IncRNAs. In addition, only six IncRNAs are already 
annotated as RefSeq noncoding RNAs (Additional file 8: 
Table S6). In fact, the most recent catalog of human in- 
tronic IncRNAs comes from the GENCODE project [9], 
which documented the intronic IncRNAs expressed in 12 
human normal tissues. Thus, the present study is a contri- 
bution towards the generation of a catalog of intronic anti- 
sense IncRNAs expressed in renal cancer. 

The set of 4303 intronic antisense IncRNAs expressed 
in renal cancer identified in the present study probably 
has diverse functions, other than being precursors of 
small RNAs, because only one IncRNA mapped to a 
known small RNA sequence (U99, Additional file 8: 
Table S6). We found that 22% of the intronic antisense 
IncRNAs have expression levels in RCC, normal kidney, 
normal liver and tumor prostate that are correlated in 
cis to the expression levels of the mRNA from the same 
locus. These IncRNAs correlated in cis are transcribed 



from loci enriched with genes related to regulation, in- 
cluding the term "Regulation of Transcription from RNA 
polymerase II", as seen when analyzing together the posi- 
tively and negatively ds-correlated antisense IncRNA/ 
mRNA as well as when analyzing only the positively cis- 
correlated transcripts (Additional file 9: Figure S3). Our 
group has described a similarly enriched GO term when 
analyzing the host gene loci of the 30% most abundant in- 
tronic antisense IncRNAs, without considering any expres- 
sion correlation between the ncRNAs and the mRNAs 
[40]. Now we point to this GO term enrichment for those 
loci expressing the antisense IncRNAs and the mRNAs in 
a correlated manner, reinforcing the suggestion that the 
IncRNAs might cis- regulate the expression of the genes in- 
volved in "Regulation of Transcription" and/or that the 
antisense IncRNAs and the mRNAs might be controlled 
by a similar regulatory event in these loci. 

We found that the expression of the majority of the 
intronic antisense IncRNAs was not correlated to the ex- 
pression of the mRNA from the same locus, and those 
are most likely regulated in an independent way of the 
mRNAs. Among these, we identified a set of antisense 
IncRNAs whose expression in RCC, normal liver, pros- 
tate tumor and kidney nontumor tissues was positively 
or negatively correlated in trans to the expression levels 
of sets of mRNAs belonging to enriched GO terms such 
as "Inflammatory response" and "Response to stress"; 
these protein-coding genes may be related to the cellular 
renal cancer context, and the correlated IncRNAs are 
candidates to be acting in trans to regulate their expres- 
sion. The present GO analyses support the proposal that 
ncRNAs might be part of a fine-tuning regulatory net- 
work in the cells [82-84]. 

Our computational analysis has generated a list of 4303 
intronic antisense IncRNAs expressed in RCC (Additional 
file 8: Table S6) that includes subsets associated to CpG 
islands, CAGE tag marks, RNA pol II binding site, 
promoter-associated chromatin marks, tissue-specificity 
and evolutionary conservation. The set of 53 intronic anti- 
sense IncRNAs expressed in common at syntenic loci in 
human and mouse represent good candidates for subse- 
quent in-depth biological follow up work; the low overlap 
may be related to the known tissue-specific expression of 
IncRNAs [8,41] and to the known tissue-pattern of expres- 
sion conservation among different species [85], consider- 
ing that StLaurent et al. [38] used mouse lung tissues and 
we have used human kidney tumor tissues. Although 
IncRNAs are much less conserved than other functional 
ncRNAs such as miRNAs and snoRNAs [86], there is 
good evidence in the literature regarding the presence 
among the intronic IncRNAs of evolutionarily conserved 
regions spanning 400 nt or more [39,85,87]. Our recent 
work with pancreatic cancer has identified an enrichment 
of conserved regions within intronic and intergenic 
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IncRNAs [46], and here we extend the identification of 
conserved regions to the intronic antisense IncRNAs 
expressed in RCC Although some of the introns could 
contain regulatory sequences, or yet undiscovered coding 
exons overlapped by the intronic RNAs, thus accounting 
for part of the enrichment signal, the observed primary 
and secondary structure conservation suggests that the 
intronic IncRNAs are under the influence of evolutionary 
constraints. 

In silico approaches have been successfully used to 
characterize sets of IncRNAs expressed in other tissues or 
cell lineages [9,28,29,46,69]. Here, we used them to obtain 
new data indicating that intronic IncRNAs should not be 
regarded simply as by-products of random transcription 
[38], but rather as a diverse and heterogeneous class of 
cellular transcripts that may comprise yet uncharacter- 
ized regulatory RNAs. The intronic IncRNAs identified 
here as expressed in RCC may have several mechanism of 
action, both positively and negatively regulating gene ex- 
pression, and as a consequence, may constitute a promis- 
ing starting point for further functional investigations. 

Material and methods 

Patient tissue material 

Individual tissue samples were analyzed for gene expres- 
sion with a 4 k-element array described below. The 29 
tissue samples consisted of 18 primary renal tumors and 
11 matched adjacent nontumor tissue from 18 patients 
who underwent radical nephrectomy for clear cell RCC 
at the Hospital of the Institute Nacional de Cancer 
(INCa), Brazil. Ethical approval for the study was 
granted by INCa institutional review under the ID num- 
ber 2701; all patients have signed an informed consent. 
Each sample was frozen and stored in liquid nitrogen 
immediately after surgery. A fraction of each sample was 
processed for histopathological diagnoses. A second in- 
dependent histopathological diagnosis of each case was 
confirmed by a reference pathologist (GV) who belongs 
to the INCa staff. Histologically normal renal tissue frag- 
ments were collected from a distant portion of the surgi- 
cally removed kidney. Clinical and anatomopathological 
patient data are detailed in Additional file 1: Table SI. 
The malignancy gene expression profile was identified 
with the 11 paired tumor and adjacent nontumor patient 
samples. Identification of the survival gene expression 
profile was performed with 16 tumor samples (nine tu- 
mors from the paired samples mentioned above for 
which the survival information was available, plus seven 
tumor samples for which only the survival information 
and not the paired nontumor tissue was available). Pa- 
tient survival was recorded from the date of nephrec- 
tomy to the date the patient died or was last known 
alive (follow-up ranged from 60 to 66 months) and pa- 
tients were identified as alive without disease (n = 8) or 



dead from cancer (n = 8). Kidney tissue samples expres- 
sion was also measured with a 44 k-element oligo-array 
described below using 4 pools of nontumor (N) samples 
from 17 RCC, and 4 pools of the corresponding 17 paired 
tumor (T) samples. They comprise all 11 tumor- 
nontumor paired clear cell renal cell carcinoma (ccRCC) 
cases that were analyzed individually with the 4 k platform 
(Additional file 1: Table SI), plus other three ccRCC cases, 
two papillary RCC cases and one chromophobe RCC case. 
Clinical and anatomopathological data of these patients 
are detailed in Additional file 7: Table S5. 

Microarray platforms design 

The custom 4 k-element microarray platform previously 
described by our group [45] is composed of 3355 unique 
cDNA probes from the Cancer EST Sequencing Project 
[88] spotted in duplicate (average length of 600 bp), plus 
positive and negative controls; 2371 probes interrogate 
cancer-related protein-coding genes compiled from the 
Entrez, OMIM and CGAP databases; an additional set of 
984 probes was randomly sampled from cDNA clones 
whose sequences showed no similarity to protein-coding 
genes in GenBank, of which 722 are putative noncoding 
transcripts that map to intronic regions of known genes, 
188 map to intergenic regions of the genome and 74 
ESTs map to known RefSeq lincRNAs (a total of 262 pu- 
tative lincRNAs). Probes were mapped and annotated 
according to the hg 19 assembly of the human Genome 
Reference Consortium (GRC) based on the RefSeq and 
UCSC datasets. The 4 k-array description is deposited in 
Gene Expression Omnibus (GEO) data repository under 
accession number GPL3985. 

The custom 44 k-element oligoarray platform designed 
by our group and manufactured by Agilent Technologies 
was previously described [40]. Essentially, the array is 
comprised of strand-specific 60-mer oligonucleotide 
probes designed for both the plus or the minus genomic 
strands of 6,258 totally intronic noncoding (TIN) RNA 
loci and 4,267 partially intronic noncoding (PIN) RNA 
loci with evidence of transcription from dbEST [40], for 
a total of 21,050 strand-specific probes that interrogate 
both strands of 10,525 unique intronic loci within 6,361 
unique protein-coding spliced human genes; the latter 
are represented by unique probes from the Agilent 
Whole Human Genome Oligo Microarray. Probes were 
mapped and re-annotated according to the hg 19 assem- 
bly of the human Genome Reference Consortium (GRC) 
based on the RefSeq and UCSC datasets. The 44 k-array 
description [40] and the re-annotation are deposited in 
the GEO repository under accession number GPL9193. 

Microarray experiments 

Total RNA was isolated from frozen tissues with TRIzol 
reagent (Life Technologies) according to the manufacturer 



Fachel et al. Molecular Cancer 2013, 12:140 
http://www.molecular-cancer.eom/content/12/1/140 



Page 1 7 of 23 



recommendations, followed by DNase I treatment for 
20 min and purification with the RNeasy Mini kit 
(Qiagen). Purified total RNA was quantified in the Nano- 
Drop ND-1000 spectrophotometer (Thermo Fisher Scien- 
tific), and checked for integrity with the 2100 Bioanalyzer 
(Agilent Technologies). 

For the 4 k-element microarray assays, complementary 
RNA (cRNA) for each of the 29 samples (Additional file 
1: Table SI) was obtained by linear amplification follow- 
ing the Wang protocol method [89]. Briefly, cDNA for 
each sample was synthesized from 3 \ig total RNA using 
an oligo-dT primer incorporating a T7 RNA promoter 
and Superscript III Reverse Transcriptase (Invitrogen). 
Double-stranded cDNA was obtained using a template 
switch oligo primer with the Advantage cDNA Polymer- 
ase mix kit (Clontech). Subsequently, complementary 
RNA (cRNA) was generated in vitro with MegaScript T7 
RNA Polymerase (Ambion). A second round of amplifi- 
cation was performed with 1 \ig of cRNA obtained in 
the previous step, in the presence of amino-allyl-UTP 
(Ambion). Coupling of amino-allyl-cRNA with Cy5 re- 
active dye was performed (Amersham Pharmacia Bio- 
tech). Labeled cRNAs were purified using RNeasy Mini 
kit (Qiagen) and hybridized to a total of 29 microarray 
slides followed by washing and drying in an automated 
Hybridization Station (GE Healthcare) according to the 
manufacturer recommendations. Array images were ac- 
quired with a Generation III Array Scanner (GE Health- 
care). Data were extracted from the scanned images with 
Array Vision 6.0 software (GE Healthcare). 

For the 44 k-element microarray assays, four pools of 
tumor (Tl to T4) or nontumor (Nl to N4) paired sam- 
ples from 17 patients were assembled (three pools of 4 
samples, one pool of five samples) as detailed in 
Additional file 7: Table S5 by mixing equal amounts of 
total RNA. Total RNA pool (300 ng) was used as tem- 
plate for the amplification of poly(A) RNA by the T7- 
RNA polymerase with the Low RNA Input Fluorescent 
Linear Amplification kit (Agilent Technologies), which 
generated Cy5- or Cy3-labeled cRNAs. A total of four 
array slides were hybridized with 750 ng each of Cy3- 
and Cy5-labeled cRNAs, in the following arrangement: 
Cy3 2T x IN Cy5; Cy3 2N x IT Cy5; Cy3 4T x 3N Cy5; 
Cy3 4N x 3T Cy5. Hybridization was performed with 
the Agilent in situ Hybridization kit-plus, as recom- 
mended by the manufacturer. The slides were washed 
and processed according to the Agilent Oligo Micro- 
array Processing protocol and were scanned on a 
GenePix 4000B scanner (Molecular Devices). To ex- 
tract intensity data from the scanned images we used 
the Agilent Feature Extraction software (Agilent 
Technologies). All the above microarray data are de- 
posited at the GEO repository under the accession 
number GSE40914. 



Microarray data analyses 

For the 4 k-element microarray, a gene was considered 
expressed if its probe intensity was higher than the local 
background intensity and above the threshold defined by 
the average intensity plus three standard deviations of a 
set of plant-derived negative control cDNA probes (GE 
Healthcare). Probes were excluded from further analyses 
when they were detected in less than 90% of the arrays 
in any of the two groups, i.e. nontumor or tumor for the 
malignancy analysis; or alive or dead from cancer for the 
survival analysis. The raw intensities were normalized by 
the quantile method [90]. 

For the 4 k-element microarray malignancy study, 
tumor/nontumor log 2 ratios were calculated followed by 
a supervised one-class statistical analysis with the Sig- 
nificance Analysis of Microarrays (SAM) tool [91] with 
1000 permutations to identify transcripts that were 
differentially expressed in eleven clear cell RCC and ad- 
jacent nontumor tissue. A sample leave-one-out cross- 
validation was performed [72,92]. Essentially, one sample 
was removed at a time, and each time a new set of sig- 
nificantly altered genes was determined with SAM using 
the remaining ten samples. This procedure was repeated 
for each of the matched tumor/ adjacent nontumor tissue 
samples; a false discovery rate (FDR) cutoff <5% was 
used in all eleven leave-one-out datasets. This approach 
was used to minimize the contribution of each individual 
patient sample to the set of significantly altered genes 
[51]. The final gene profile is comprised of altered genes 
present in 100% of the leave-one-out datasets plus a 1.5- 
fold minimal change criterion. For the 4 k-element 
microarray survival study: a two-class unpaired Signifi- 
cance Analysis of Microarrays (SAM) analysis (FDR 
<10%) [91] combined with the Golubs discrimination 
score analysis (p < 0.01) [93] was used for identifying 
transcripts expressed in clear cell RCC samples that 
were significantly correlated with the patient survival 
outcome. Only those genes found in common in both 
analyses were used to compose a profile of genes corre- 
lated to the outcome. The 16 patient samples were or- 
dered in the heat-maps according to the correlation of 
their gene expression profiles to the average expression 
profile obtained from the 8 samples of patients who died 
from the disease within the 5-year follow-up after 
surgery. 

For the 44 k-element oligoarray, the transcripts were 
considered expressed if the intensity of the spot was 
above the mean intensity plus 2 SD of the negative con- 
trol spots in 3 out of 4 oligoarrays in one of the two 
groups (tumor or nontumor pools). For the intronic 
IncRNA transcripts, only the probe mapping to the gen- 
ome in the antisense direction relative to the protein- 
coding mRNA in the locus was considered for further 
analyses. 
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Real time RT-PCR 

Reverse transcription was performed with 1 \ig ali- 
quots of DNase I-treated purified total RNA from the 
same paired samples that were used in the microarray 
experiments, oligo-dT primers and Superscript III 
(Invitrogen) according to the manufacturers instruc- 
tions. For the relative quantification of transcript 
levels, real-time PCR was performed using Power 
SYBR Green PCR Master Mix (Applied Biosystems) 
on an ABI PRISM 7500 machine (Applied Biosystems) 
and the following primers: ncCllorf49, GAGAAGC 
AGCGATGACACGAT (Forward), AGAGGAGCAAC 
CCTCAGGAAA (Reverse); HDACS exon 24/25, TGC 
AG C AA A AG CCC A AC AT (Forward), AGACCAGCG 
GCGAACTTCT (Reverse); ncHDAC5, TATTCTGGA 
GTCGCCTGTGCTT (Forward), AACCACAGCCCTA 
TTGGTATGC (Reverse); ncRAB31, CCCAGTGAGAG 
TGATATTTTGTTATGA (Forward), CCACACCTTC 
TTTCTGCCTGTT (Reverse); ncSRPKl, CAAGGGCT 
GAGTCCTTTTTCA (Forward), GCAGTGCCTTGCC 
CTTATTG (Reverse); HPRT1, TGACACTGGCAAA 
ACAATGCA (Forward), GGTCCTTTTCACCAGCA 
AGCT (Reverse). The reactions were incubated at 
95°C for 15 min, prior to 40 PCR cycles (15 sec at 
95°C, 60 sec at 60°C). All reactions were performed 
in triplicates in a final volume of 20 \x\ containing 
5 |lx1 of diluted cDNA (1:3) and 800 nM of forward 
and reverse gene-specific primers. The gene ex- 
pression levels of hypoxanthine phosphoribosyl trans- 
ferase 1 (HPRT1) were used as a control to normalize 
the measurements. Transcript levels were expressed 
following the 2" AACt method [94], where AACt = 

(ACt candidate gene in tumor sample " ACt candidate gene in nontumor 

sample) > with ACt = Ct candidate gene - Ct HPRT1. 

Orientation-specific reverse transcription 

For the orientation-specific cDNA synthesis of ncHDACS, 
ncCllorf49, ncRAB31 and ncSRPKl, 1 \ig of purified total 
RNA pretreated with DNase I (RNeasy Mini kit, Qiagen) 
was used for the reverse transcription reaction. A pool of 
RNA from 10 ccRCC samples or from 10 adjacent nontu- 
mor tissues samples was used as templates for the cDNA 
synthesis. In addition, purified DNasel-treated total RNA 
from tumor kidney cell lineage 786-0 or from nontumor 
cell lineage RC-124 was used as template. For each sam- 
ple, two cDNA synthesis test reactions were performed, 
each with 1 \ig of total RNA and 500 nM of an oligo- 
nucleotide primer complementary to the sequence of the 
IncRNA that would be transcribed from either the sense 
or the antisense strand within the corresponding loci of 
interest (see PCR primers above). Superscript III Super 
Mix kit (Invitrogen) was used according manufacturers 
instructions. To avoid RNA self- annealing, pre-incubation 
of RNA and primer in annealing buffer was performed at 



65°C for 10 min followed by the addition of reverse tran- 
scriptase in enzyme buffer pre-warmed at 57°C, and the 
reaction was incubated for 50 min at 57°C and denatured 
at 95°C for 10 min. To verify the absence of self- annealing 
or of genomic DNA contamination, a control reverse tran- 
scription reaction was performed in parallel without the 
addition of primers. These test and control samples were 
used for end-point PCR (40 cycles) with the pair of primers 
for the corresponding IncRNA, as described above. 

RACE-PCR 

The Human Fetal Kidney Marathon-Ready cDNA library 
(Clontech) and the Marathon cDNA Amplification Kit 
(Clontech) were used to perform the 3'- and 5' RACE- 
PCR, following the manufacturer instructions with the fol- 
lowing primers: HDAC5_F_GSP_RACE: AGGAGCCCT 
GCAGAGAGCACATGG; HDAC5-F_Nested_RACE: AA 
G GG G A ATCTCCCACC AG CCTG TC; HDAC5-R_GSP_ 
RACE: GGGGTGCTGCATGTCACCCAGTC; HDAC5- 
R_Nested_RACE: TGGAGTCGCCTGTGCTTCCTGTTTG. 

RNA stability assay 

786-0 cells were maintained at exponential growth in 
Dulbeccos modified Eagles medium (DMEM) contain- 
ing 10% calf serum, penicillin, and streptomycin. Actino- 
mycin D was dissolved in DMSO and added to cells at 
5 (ig/ml. Cells were collected at time 0 h (before actino- 
mycin D treatment), 30 min, 1 h, 2 h and 4 h. Total 
RNA was extracted and DNAse I treated with RNeasy 
Mini kit (Qiagen). cDNA was obtained with Superscript 
III Super Mix kit (Invitrogen) according to manufac- 
turers instructions. These cDNAs were used for real- 
time PCR with the pair of primers for ncHDACS as 
described above. As a control of the assay, the half-life 
for the C-MYC transcript was checked, and the expected 
value of -30 min was obtained. 

In silico analyses 

To search for protein coding potential of the expressed 
antisense IncRNAs in renal cancer we used the Coding 
Potential Calculator (CPC) software [64] with default pa- 
rameters. To search for RefSeq annotation we mapped the 
genomic coordinates of our 4303 antisense IncRNA set to 
the RefSeq UCSC database (http://genome.ucsc.edu/). To 
identify possible precursors of small RNAs among our set 
of IncRNAs we cross-referenced the genomic coordinates 
of our 4303 antisense IncRNA set to snoRNA [62] and 
microRNA [63] databases, using the sno/miRNA 
(wgRNA) UCSC track (http://genome.ucsc.edu/). For the 
gene expression meta-analysis we used the Oncomine™ 
Gene Browser software tool (http://www.oncomine.org). 

We investigated the co-expression pattern of intronic 
antisense IncRNAs and mRNAs, both in cis (IncRNA 
and mRNA from the same locus expressed in a given 
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tissue) and in trans (each IncRNA and all mRNAs 
expressed in a tissue). First, we created a list of all the 
antisense IncRNAs expressed in RCC, and identified 
those that were also expressed in other three tissues, 
namely nontumor kidney, normal liver and prostate 
tumor human tissues (GEO: GSE5452), using the nor- 
malized microarray expression data previously obtained 
by our group [40]. 

For the in cis correlation analysis, we used the data 
from RCC and the other three tissues and calculated the 
Spearman correlation (p) using the R software environ- 
ment (www.r-project.org), with a cutoff of -0.5 > p >0.5 
(p <0.05). We used GraphPad Prism software (GraphPad 
Softwares, La Jolla, California, USA) to obtain the histo- 
gram of Spearman correlation distribution in cis. With 
the Bingo software [95], we identified enriched Gene 
Ontology terms (p <0.05) considering the set of protein- 
coding genes co-expressed in cis (-0.5 > p >0.5; p <0.05), 
the GO_Biological_Process ontology file and the whole 
human genome annotation default Bingo 2.44 version 
datasets. 

For the trans correlation analysis, we only considered 
the top 20% most abundant antisense IncRNAs in RCC. 
We constructed a matrix of correlation (using a R script) 
of 693 antisense IncRNAs versus 5438 mRNAs expressed 
in RCC and in the other three tissues described above. 
Next, we selected the IncRNAs most correlated in trans 
(cutoff -0.7 > p >0.7, p <0.05) and used the Genomica 
software (http://genomica.weizmann.ac.il) [96] to identify 
among the correlated mRNAs the sets of genes (modules) 
that were significantly enriched (p <0.05 with Bonferroni 
correction) for a specific Gene Ontology term from the 
three ontologies, namely Biological Processes, Molecular 
Function and Cellular Component. 

For transcription regulatory elements and conservation 
analyses, we used the BEDTools software package [97] 
to compare the genomic coordinates (hgl9 GRCh37) of 
our antisense IncRNAs dataset with the genomic coordi- 
nates of the following datasets available at UCSC Genome 
Browser: RIKEN CAGE tags [98] from PolyA + RNA- 
derived libraries from 35 cell lines released by the 
ENCODE project [65]; predicted CpGs islands [66]; HMM 
active promoter prediction [59]; RNA Polymerase II bind- 
ing site from the transcription factor ChlP-seq uniform 
peaks ENCODE track for 32 human cell lines [65]; ChlP- 
seq data of H3K27ac and H3K4mel DNA binding sites 
from seven different human cell lines [59]; H3K4me3, 
H3K36me3 and H3K27me3 DNA binding sites from 
human renal epithelial cells [60]; RNA-seq data of PolyA + 
RNA-derived libraries from 9 tissues [68]; RNA-seq data 
of strand-oriented RNA-derived libraries from 7 cell 
lines [69]. 

To test for the statistical significance of the overlap 
distribution (see below), we created 10 control datasets 



of randomly selected sequences from the entire human 
genome matching our set of expressed antisense IncRNA 
sequences in number and length. Regulatory elements 
mapping up to 1 kb upstream from TSS and 5'UTRs of 
RefSeq known transcripts were removed to avoid the 
contribution of signals at the start sites of known genes 
to the enrichment of regulatory elements at the start 
sites of IncRNAs mapping nearby. As a pre-processing 
step of the CAGE tag data analysis, only CAGE tags that 
presented RPKM (reads per kilobase per million) > 1 
were considered for further analysis [69]. We computed 
the distance of the closest CAGE tags, CpG islands and 
HMM predicted active promoter, RNA Pol II, H3K27ac, 
H3K4mel, H3K4me3 and H3K27me3 marks to the pre- 
dicted TSSs of our set of 4303 expressed antisense 
IncRNAs, 11102 isoforms from 5632 expressed protein- 
coding mRNAs, and control sets of 4303 random se- 
quences. Regulatory elements distant up to 10 kb of the 
sequence initiation were considered. For the H3K36me3 
mark, the number of overlapping elements was re- 
corded. Those records were used to create a distribution 
of overlaps for all IncRNAs, binned into 1-kb intervals. 
The Kolmogorov-Smirnov (KS) test statistics was used 
to compare continuous probability distributions of abun- 
dance of each relevant genomic mark with those calcu- 
lated for each of the 10 control random sets (p-values < 
0.05 threshold). 

To evaluate the tissue specificity of antisense IncRNAs 
and protein-coding mRNAs expressed in renal cancer a 
meta-analysis including Burge s RNA-seq data from nine 
human tissues [68] and strand-oriented Caltech RNA- 
seq libraries from seven human cell lineages [69] was 
performed. For the Burge RNA-seq data, we mapped the 
transcripts to the hgl9 reference genome (hgl9 GRCh37) 
using TopHat [99] and assembled the transcripts using 
cufflinks [100]. RefSeq mRNA (October 2012, UCSC) plus 
intronic antisense IncRNAs comprised in the 44 k array 
were used as the reference transcripts. To determined tis- 
sue specificity we used an approach similar to Marques 
and Ponting [101]; thus we calculated the fraction of 
expression in each tissue (F.E.T.) as being the FPKM 
observed in a specific tissue divided by the sum of FPKMs 
in all tissues. To address statistical significance, we 
performed Fischers exact test comparing the rates of 
F.E.T. > 0.5 between the IncRNAs and the protein-coding 
mRNAs (p < 0.001). The genomic coordinates of antisense 
IncRNAs (or mRNAs) expressed in RCC were overlapped 
with the coordinates of the Caltech RNA-seq data to 
determine if the transcript was identified in each of the 
strand-oriented RNA-seq libraries. 

Conservation of expression pattern analyses were per- 
formed by mapping the sequence coordinates of anti- 
sense IncRNAs expressed in RCC to the coordinates of 
transcripts expressed in humans and in 15 other 
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vertebrate species, as compiled in the TransMap cross- 
species syntenically mapped cDNA alignments [70], and 
recording the hits in each species. To determine the stat- 
istical significance of expression pattern conservation, 
we compared the number of hits against the 15 species 
obtained for the IncRNA sequences with the number of 
hits against the 15 species obtained for 10 random sets 
of sequences with identical length as those of the 
IncRNAs. Fischer s exact test (p < 0.05 threshold) was 
used. 

Conservation of expression pattern of intronic anti- 
sense IncRNAs between RCC and intronic antisense 
IncRNAs expressed in mouse [38] was identified by 
transposing the mouse genomic coordinates to the hu- 
man genome using the liftOver tool (http://genome.ucsc. 
edu/cgi-bin/hgLiftOver), and the overlap between these 
transcripts and the set of intronic antisense IncRNAs 
expressed in RCC was determined using intersectBed 
from the BEDtools package [97]. The same analysis was 
done with the coordinates from a random set of 4303 
sequences extracted from the subset of probes with no 
evidence of expression in RCC among the entire set of 
10,525 intronic antisense IncRNAs probed in the 44 k 
array. Fisher test was used to determine statistical sig- 
nificance (p < 0.05 threshold). 

The analysis of DNA sequence conservation was per- 
formed by cross-referencing the human genome coordi- 
nates of antisense IncRNAs expressed in RCC with the 
coordinates of PhastCons DNA conserved elements 
from vertebrates, from placental mammalians and from 
primates [61], and counting the number of overlaps. To 
determine the statistical significance, the coordinates 
from the 10 random sets described above were analyzed 
in the same way against PhastCons dataset. Fischers 
exact test (p < 0.05 threshold) was used. 

RNAz tool [71] was used to predict structurally con- 
served and thermodynamically stable RNA secondary 
structures. Only predicted structures with P (Probability) > 
0.5 were considered as containing conserved secondary 
structures [71]. 

Additional files 



Additional file 1: Table SI. Clinical and pathological data for the 18 
clear cell RCC patients analyzed with the 4 k-element cDNA microarrays. 

Additional file 2: Figure SI. Protein-coding gene expression signature 
of ccRCC. Heat map of 217 differentially expressed protein-coding genes 
(rows) identified in 11 ccRCC patients (columns) (FDR <5%; 1.5-fold 
change). Patient ID numbers are indicated at the bottom. Blue indicates 
lower expression, and red, higher expression in tumor CO tissue in rela- 
tion to adjacent nontumor (N) tissue. 

Additional file 3: Table S2. List of 217 protein-coding genes differen- 
tially expressed in ccRCC in the microarray analysis. 

Additional file 4: Figure S2. Expression signature of intronic IncRNAs 
correlated to patient survival in ccRCC. (A) A set of 26 intronic IncRNAs 



(rows) identified as differentially expressed (FDR <5%; p <0.01) between 
two ccRCC patient groups with distinct outcomes, namely alive and 
disease-free or dead from cancer within a 5-year follow-up period after 
surgery. Patient samples (columns) are ordered by their correlation rela- 
tive to the mean expression profile of the group of patients that died 
from cancer. The color code shows higher (red) or lower (blue) expres- 
sion relative to the mean expression of that IncRNA in all patients. (B) 
Clinical and pathological features: PS, Patient Status (white = alive 
disease-free; black = cancer death); T, primary tumor classification (white 
= 1 a/1 b; black = 2/3a/3b/3c); N, regional lymph node positive for metasta- 
sis (white = no; black = yes); M, presence of metastasis at surgery (white 
= no; black = yes); Necr, presence of necrosis (white = no; black = yes); 
Sz, primary tumor size (white < 7 cm; black > 7 cm); FG, Fuhrman's nu- 
clear grade (white = II; black = lll/IV); Age, age at surgery (white < 60-year- 
old; black > 60-year-old); Gend, Gender (white = female; black = male). 
(C) Correlation coefficient (r) of each sample in relation to the average ex- 
pression profile of all samples from patients who died from the disease. 
Patient samples were ordered according to this correlation. 

Additional file 5: Table S3. List of 26 intronic IncRNAs significantly 
correlated to RCC patient's survival outcome identified through a cancer- 
related death analysis of the microarray data. 

Additional file 6: Table S4. ncHDAC5 IncRNA half-life measured in a 
human renal cell line following transcriptional inhibition with actinomycin 
D and ncHDAC5 IncRNA conserved secondary structure predictions calcu- 
lated with the RNAz tool. 

Additional file 7: Table S5. Clinical and pathological data of the 17 
RCC patients analyzed with the 44 k-element oligoarrays. 

Additional file 8: Table S6. List of 4303 intronic antisense IncRNAs 
expressed in RCC. Information is provided for the evolutionary conservation 
and for the c/'s-correlated analysis of expression in four tissues. 

Additional file 9: Figure S3. All significantly enriched GO terms 
identified for the set of protein-coding genes expressed in RCC and also 
in other three human tissues (normal liver, prostate tumor and kidney 
nontumor) and c/'s correlated (-0.5 > p >0.5; p < 0.05) with antisense 
IncRNAs from the same loci. GO enriched terms are organized in Sub- 
group I: biological regulation; Subgroup II: cellular process; Subgroup III: 
developmental process. Subgroup IV: GO enriched terms for protein- 
coding genes only showing positive c/'s correlation with the antisense 
IncRNAs. Subgroup V: GO enriched terms for protein-coding genes only 
showing negative c/'s correlation with the antisense IncRNAs. Color scale 
indicates increasingly higher statistical significance of enriched GO terms: 
Yellow, p =0.05; Dark orange, p <0.0001. 

Additional file 10: Table S7. GO enriched terms from c/'s or trans 
correlation analyses. (A) GO enriched terms for all protein-coding genes 
with significant c/'s correlation with the IncRNA from the same locus. (B, C) 
GO enriched terms for protein-coding genes with significant c/'s correl- 
ation with the IncRNA from the same locus, for those only with (B) posi- 
tive correlation or (C) negative correlation among IncRNAs and protein- 
coding genes. (D) Average trans correlation value for the protein-coding 
gene set expressed in RCC and other three tissues, belonging to that GO- 
enriched term, and having significant trans correlation with a IncRNA. (E) 
Correlation values for all IncRNAs with significant trans correlation with all 
the protein-coding genes expressed in RCC plus other three tissues. (F) 
Protein-coding genes expressed in RCC plus other three tissues, belong- 
ing to that GO-enriched term in the trans correlation analysis. 

Additional file 11: Figure S4. Heat map of trans-correlated expression 
among the 20% most abundant antisense IncRNAs (n = 693) expressed in 
RCC and other three tissues and the 5293 protein-coding mRNAs expressed 
from different loci. A yellow entry indicates a Spearman correlation p >0.7; a 
blue entry indicates a Spearman correlation p < -0.7; a black entry indicates 
all other correlation values. A total of 693 antisense IncRNAs and 5293 
mRNAs expressed in the four tissues were considered in the trans-correlation 
analysis. 

Additional file 12: Figure S5. Tissue expression pattern of protein- 
coding genes. (A) Heat map representing abundance of 5632 RCC- 
expressed protein-coding genes (columns) across other nine human tis- 
sues (rows) from public RNA-seq libraries [68]. Color intensity represents 
fractional density expression of each IncRNA across all tissues (see 
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Material and methods for details). (B) Heat map indicating presence (red) 
or absence (white) of 5298 RCC-expressed protein-coding genes (columns) 
across seven human cell lineages (rows) from public strand-oriented RNA- 
Seq libraries [69]. Expression data was hierarchically clustered. 
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